Zero-inflation deals with response data, \(Y_i\), not predictors, \(X_i\).
Zero inflation has received the most attention for count data.
Also relevant to:
Top 4 reasons why you might get a 0 when counting critters?
Picture (and next two slides) provided by Matt Russell (formally in FR)
Count data:
For continuous data:
How can we determine if we have excess zeros?
What do we do if we have zero-inflation?
For the in-class exercise, we will focus on the latter approach.
Group all 0’s into a single category:
Hurdle: positive counts arise if you exceed some threshold (with probability \(\pi\))
\(Z_i =\left\{\begin{array}{ll} \mbox{0 when } y=0 & \mbox{occurs with probability } (1-\pi) \\ \mbox{1 when } y>0 & \mbox{occurs with probability } \pi \end{array}\right\}\)
Can model \(Z_i\) using using logistic regression to allow presence-absence to depend on covariates
Model the non-zero data (using truncated distribution models)
Can do this in two steps or use a single modeling framework (see Hurdle models Ch 11.5 in Zuur et al).
Truncated distributions for non-zero count data:
\(P(Y = y | Y > 0) = \frac{P(Y=y)}{P(Y>0)}= \frac{f(y)}{(1-f(0))}\)
Remember, \(P(A|B)\)=P(A and B)/P(B)
A truncated Poisson would look like…
\[P(Y=y | y > 0) = \frac{\frac{e^{-\lambda}\lambda^y}{y!}}{1-e^{-\lambda}}\]
We can incorporate covariates, using: log(\(\lambda) = \beta_0 + \beta_1x +\ldots\)
Note, however:
For continuous data:
Which function in R is used to determine \(F(Y)\)? pnorm!
Two ways to get a 0:
Zero-inflation:
Assigning meaning to the zero-inflation process can in some cases be useful, but it also requires a leap of faith!
See comments on this blog
Probability Mass Function: \(f(y) = \frac{e^{-\lambda}\lambda^y}{y!}\)
Let: \(\pi\) be the probability of a zero-inflated response
ZIP model (Zuur):
\(P(Y=y) = f(y) =\left\{\begin{array}{ll} \pi + (1-\pi)e^{-\lambda} & \mbox{if } y = 0\\ (1-\pi)\frac{e^{-\lambda}\lambda^y}{y!} & \mbox{if } y = 1, 2, 3, \ldots \end{array} \right.\)
Get a 0 two ways:
Non-zero responses: \((1-\pi)f(y)\)
Zuur and zeroinfl function in pscl R package:
Kery:
ZIP model (Zuur and zeroinfl):
\(P(Y=y) = f(y) =\left\{\begin{array}{ll} \pi + (1-\pi)e^{-\lambda} & \mbox{if } y = 0\\ (1-\pi)\frac{e^{-\lambda}\lambda^y}{y!} & \mbox{if } y = 1, 2, 3, \ldots \end{array} \right.\)
ZIP model (Kery):
\(P(Y=y) = f(y) =\left\{\begin{array}{ll} 1-\psi + \psi e^{-\lambda} & \mbox{if } y = 0\\ \psi\frac{e^{-\lambda}\lambda^y}{y!} & \mbox{if } y = 1, 2, 3, \ldots \end{array} \right.\)
Probability Mass Function: \(f(y) = {y+\theta-1 \choose y}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^y\)
ZINB model (Zuur et al):
\(f(y) =\left\{ \begin{array}{ll} \pi + (1-\pi)\left(\frac{\theta}{\mu+\theta}\right)^\theta & \mbox{if } y = 0\\ (1-\pi){y+\theta-1 \choose y}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^y & \mbox{if } y = 1, 2, 3, \ldots \end{array} \right.\)
ZINB model (Kery):
\(f(y) =\left\{ \begin{array}{ll} 1-\pi + \pi\left(\frac{\theta}{\mu+\theta}\right)^\theta & \mbox{if } y = 0\\ \pi{y+\theta-1 \choose y}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^y & \mbox{if } y = 1, 2, 3, \ldots \end{array} \right.\)
We can use the zeroinfl function in the pscl package in R to fit:
Can also code models in JAGS (see Kery Ch 14) and fit using other packages (e.g. glmmTMB)
Remember:
zeroinf: models probability of a zero-inflated response (i.e., “false” zero) = \(\pi_i\)As a result, the sign of the coefficients will differ between the two approaches.
Can compare Poisson, Negative Binomial, Zero-inflation models
My experience, and that of others, is that a Negative Binomial model (without zero-inflation) often “wins” (but not always)
Also, zero-inflated negative binomial models can sometimes be difficult to fit (past homework problem)